Backport of docs - remove use of consul leave during upgrade instructions as it caused leadership changes into release/1.13.x #17770
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Backport
This PR is auto-generated from #17758 to be assessed for backporting due to the inclusion of the label backport/1.13.
The below text is copied from the body of the original PR.
Note: this needs to be updated on all versions of docs, so backport labels for 1.13 - 1.16 are in place. I will manually cherry pick this into PRS to the docs release branches prior to 1.13.
Description
A customer ran into an issue where leadership elections occurred multiple times for each server that they were upgrading when the initial goal of the process is to ensure the leader is upgraded last. This was caused by the use of
consul leave
during the upgrade process as they upgraded from consul to consul enterprise.When upgrading it is important that the leader goes last, so that the leader is replicating raft logs on the lower consul version to servers that are either at the same level or at a higher level and are aware of all fields that are within the raft log.
When using consul leave during the upgrade process, the following was observed.
Observed when shutting down
The following occurred when
consul leave
was issued:term
index` (ex: cluster has a term of 100 and server being upgraded has a term of 104) until it shuts downThis happened on multiple servers and the server being upgraded had a
term
that was several greater than the leader and the rest of the cluster.At this point the server is shut down and has the new consul binary.
Observed when restarting
The instructions then have the user start the server using something like
systemctl start
. At this point, the following was observed:This loop of losing leadership / starting new elections / electing a new leader will continue until the
term
of the cluster matches theterm
of the upgraded server. In the example previously mentioned where the cluster had a term of100
and the upgraded server has aterm
of 104, this loop would occur 4 times.At this point, the upgrade process has encountered multiple leader election and the process has been destabilized because it is highly probable that your leader is now different and overall your upgrade process is compromised and not set up for success.
Overview of commits